Opened 9 years ago

Closed 9 years ago

#3131 closed defect (fixed)

KNN geography still gives ERROR: index returned tuples in wrong order

Reported by: robe Owned by: pramsey
Priority: blocker Milestone: PostGIS 2.2.0
Component: postgis Version: master
Keywords: Cc:

Description

I thought we were over this issue, but in trying to troubleshoot why my regress geography weren't using an idnex (which I thought they were before), I am getting back this error:

Steps to produce:

CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
INSERT INTO knn_recheck_geog(gid,geog)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*1.11,y*0.95)::geography As geog
FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-90,90,1) As y;

INSERT INTO knn_recheck_geog(gid, geog)
SELECT 500000, 'LINESTRING(-95 -10, -11 65, 5 10, -70 60)'::geography;

INSERT INTO knn_recheck_geog(gid, geog)
SELECT 500001, 'POLYGON((-95 10, -95.6 10.5, -95.9 10.75, -95 10))'::geography;

INSERT INTO knn_recheck_geog(gid,geog)
SELECT 600000 + ROW_NUMBER() OVER(ORDER BY gid) AS gid, ST_Buffer(geog,1000) As geog
FROM knn_recheck_geog
WHERE gid IN(1000, 10000, 2000, 2614, 40000);

CREATE INDEX idx_knn_recheck_geog_gist ON knn_recheck_geog USING gist(geog);

set enable_seqscan = false;
SELECT gid
FROM knn_recheck_geog
ORDER BY 'POINT(95 10)'::geography <-> geog LIMIT 5;

Out comes:

ERROR:  index returned tuples in wrong order

My postgis_full_version

SELECT postgis_full_version() || ' ' || ' ' || version();
POSTGIS="2.2.0dev r13565" GEOS="3.5.0dev-CAPI-1.9.0 r4034" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.7.8" LIBJSON="0.12" RASTER  PostgreSQL 9.5devel on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit

Change History (17)

comment:1 by robe, 9 years ago

Try this:

CREATE TABLE knn_recheck_geog_small (
    gid integer,
    geog geography
);



INSERT INTO knn_recheck_geog_small VALUES (33768, '0101000020E61000003D0AD7A370DD57406666666666E62440');
INSERT INTO knn_recheck_geog_small VALUES (33767, '0101000020E61000003D0AD7A370DD57400000000000002340');
INSERT INTO knn_recheck_geog_small VALUES (33587, '0101000020E610000066666666669657406666666666E62440');
INSERT INTO knn_recheck_geog_small VALUES (33586, '0101000020E610000066666666669657400000000000002340');
INSERT INTO knn_recheck_geog_small VALUES (33769, '0101000020E61000003D0AD7A370DD5740CDCCCCCCCCCC2640');
INSERT INTO knn_recheck_geog_small VALUES (33766, '0101000020E61000003D0AD7A370DD57409A99999999192140');
INSERT INTO knn_recheck_geog_small VALUES (33588, '0101000020E61000006666666666965740CDCCCCCCCCCC2640');
INSERT INTO knn_recheck_geog_small VALUES (33585, '0101000020E610000066666666669657409A99999999192140');
INSERT INTO knn_recheck_geog_small VALUES (33949, '0101000020E610000014AE47E17A2458406666666666E62440');
INSERT INTO knn_recheck_geog_small VALUES (33948, '0101000020E610000014AE47E17A2458400000000000002340');

CREATE INDEX idx_knn_recheck_geog_small_gist ON knn_recheck_geog_small USING gist(geog);


SELECT gid
FROM knn_recheck_geog_small
ORDER BY 'POINT(95 10)'::geography <-> geog LIMIT 1;

comment:2 by robe, 9 years ago

knn-recheck regress is now crashing at: r13584 (think it probably started at r13582

Creating database 'postgis_reg'
Loading PostGIS into 'postgis_reg'
PostgreSQL 9.5devel on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit
  Postgis 2.2.0dev - r13584 - 2015-05-29 22:10:13
  scripts 2.2.0dev r13584
  GEOS: 3.5.0dev-CAPI-1.9.0 r4034
  PROJ: Rel. 4.8.0, 6 March 2012

I'll upgrade my 9.5 to latest and post a backtrace here if its still crashing after.

Last edited 9 years ago by robe (previous) (diff)

comment:3 by robe, 9 years ago

I'll take another backtrace after I'm done rebuilding postgres, but this is what I have from weekold postgresql 9.5 after recent postgis install: running this query -

SELECT gid, RANK() OVER(ORDER BY ST_Distance( 'POINT(95 10)'::geography, geog) )
FROM knn_recheck_geog
ORDER BY 'POINT(95 10)'::geography <-> geog LIMIT 5;
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 12180.0x2e10]
0x00000000708a3324 in geography_distance_uncached (fcinfo=fcinfo@entry=0x273e670) at geography_measurement.c:86
86                      tolerance = PG_GETARG_FLOAT8(2);
(gdb) bt
#0  0x00000000708a3324 in geography_distance_uncached (fcinfo=fcinfo@entry=0x273e670) at geography_measurement.c:86
#1  0x0000000000782ebf in DirectFunctionCall3Coll (func=func@entry=0x708a3200 <geography_distance_uncached>, collation=collation@entry=0, arg1=<optimized out>, arg2=<optimized out>, arg3=arg3@entry=0) at fmgr.c:1074
#2  0x00000000708a33f3 in geography_distance_knn (fcinfo=<optimized out>) at geography_measurement.c:58
#3  0x000000000057c15b in ExecMakeFunctionResultNoSets (fcache=0x4b5e820, econtext=0x4b5e0c0, isNull=0x4c380da "", isDone=<optimized out>) at execQual.c:2018
#4  0x00000000005820c0 in ExecTargetList (isDone=0x273ebfc, itemIsDone=0x4c38270, isnull=0x4c380d8 "", values=0x4c380a0, econtext=0x4b5e0c0, targetlist=0x4c38210) at execQual.c:5363
#5  ExecProject (projInfo=<optimized out>, isDone=isDone@entry=0x273ebfc) at execQual.c:5578
#6  0x000000000059eb33 in ExecWindowAgg (winstate=winstate@entry=0x4b5dde8) at nodeWindowAgg.c:1746
#7  0x000000000057b0b8 in ExecProcNode (node=node@entry=0x4b5dde8) at execProcnode.c:500
#8  0x0000000000598459 in ExecSort (node=node@entry=0x4b5db50) at nodeSort.c:103
#9  0x000000000057b028 in ExecProcNode (node=node@entry=0x4b5db50) at execProcnode.c:488
#10 0x0000000000590d90 in ExecLimit (node=node@entry=0x4b5d810) at nodeLimit.c:91
#11 0x000000000057aed8 in ExecProcNode (node=node@entry=0x4b5d810) at execProcnode.c:520
#12 0x0000000000577c7e in ExecutePlan (dest=0x4c30398, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x4b5d810, estate=0x4b5d6f8) at execMain.c:1549
#13 standard_ExecutorRun (queryDesc=0x4b49728, direction=<optimized out>, count=0) at execMain.c:337
#14 0x000000000068bd58 in PortalRunSelect (portal=portal@entry=0x4b51698, forward=forward@entry=1 '\001', count=0, count@entry=41153104, dest=dest@entry=0x0) at pquery.c:946
#15 0x000000000068d356 in PortalRun (portal=0x273eeb0, portal@entry=0x4b51698, count=41153104, count@entry=2147483647, isTopLevel=isTopLevel@entry=0 '\000', dest=0x0, dest@entry=0x4c30398, altdest=altdest@entry=0x4c30398, compl
#16 0x000000000068ac54 in exec_simple_query (query_string=0x0) at postgres.c:1104
#17 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2a81a8, dbname=0x10000f000e000d <error: Cannot access memory at address 0x10000f000e000d>, username=<optimized out>) at postgres.c:4025
#18 0x000000000062ac03 in BackendRun (port=0x273f400) at postmaster.c:4162
#19 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x317fa0) at postmaster.c:4649
#20 0x00000000007c6b60 in main (argc=3, argv=0x317fa0) at main.c:198

comment:4 by robe, 9 years ago

still crashes with latest 9.5

comment:5 by pramsey, 9 years ago

Should be better at r13587

comment:6 by pramsey, 9 years ago

Resolution: fixed
Status: newclosed

comment:7 by robe, 9 years ago

Resolution: fixed
Status: closedreopened

nope knn_check regress still crashing for me at r13587 and with newest PostgreSQL 9.5 ( http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ac6f22957d2f2999034b6a14d0d4bee25ba95f04 )

POSTGIS="2.2.0dev r13587" GEOS="3.5.0dev-CAPI-1.9.0 r4034" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8" LIBJSON="0.12"

Back trace looks like this now:

[Switching to Thread 2580.0x2014]
0x00000000708a3314 in geography_distance_uncached (fcinfo=fcinfo@entry=0x273e690) at geography_measurement.c:87
87                      tolerance = PG_GETARG_FLOAT8(2);
(gdb) bt
#0  0x00000000708a3314 in geography_distance_uncached (fcinfo=fcinfo@entry=0x273e690) at geography_measurement.c:87
#1  0x0000000000782fa4 in DirectFunctionCall4Coll (func=func@entry=0x708a3200 <geography_distance_uncached>, collation=collation@entry=0, arg1=<optimized out>, arg2=<optimized out>, arg3=arg3@entry=0, arg4=arg4@entry=0) at fmgr.c:1101
#2  0x00000000708a33ec in geography_distance_knn (fcinfo=<optimized out>) at geography_measurement.c:58
#3  0x000000000057c19b in ExecMakeFunctionResultNoSets (fcache=0x4da1398, econtext=0x4da11d0, isNull=0x4da24f1 "\177~\177\177\177\177\177\060x\037", isDone=<optimized out>) at execQual.c:2018
#4  0x0000000000582100 in ExecTargetList (isDone=0x273ebec, itemIsDone=0x4da2640, isnull=0x4da24f0 "", values=0x4da24c8, econtext=0x4da11d0, targetlist=0x4da2608) at execQual.c:5363
#5  ExecProject (projInfo=projInfo@entry=0x4da2510, isDone=isDone@entry=0x273ebec) at execQual.c:5578
#6  0x0000000000582550 in ExecScan (node=node@entry=0x4da10b8, accessMtd=accessMtd@entry=0x597900 <SeqNext>, recheckMtd=recheckMtd@entry=0x5978f0 <SeqRecheck>) at execScan.c:207
#7  0x0000000000597973 in ExecSeqScan (node=node@entry=0x4da10b8) at nodeSeqscan.c:114
#8  0x000000000057aff8 in ExecProcNode (node=node@entry=0x4da10b8) at execProcnode.c:412
#9  0x0000000000598499 in ExecSort (node=node@entry=0x4da0e20) at nodeSort.c:103
#10 0x000000000057b068 in ExecProcNode (node=node@entry=0x4da0e20) at execProcnode.c:488
#11 0x0000000000590dd0 in ExecLimit (node=node@entry=0x4da0ae0) at nodeLimit.c:91
#12 0x000000000057af18 in ExecProcNode (node=node@entry=0x4da0ae0) at execProcnode.c:520
#13 0x0000000000577cbe in ExecutePlan (dest=0x4dd5868, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x4da0ae0, estate=0x4da09c8) at execMain.c:1549
#14 standard_ExecutorRun (queryDesc=0x4e61158, direction=<optimized out>, count=0) at execMain.c:337
#15 0x000000000068bde8 in PortalRunSelect (portal=portal@entry=0x4d98988, forward=forward@entry=1 '\001', count=0, count@entry=41153104, dest=dest@entry=0x0) at pquery.c:946
#16 0x000000000068d3e6 in PortalRun (portal=0x273eeb0, portal@entry=0x4d98988, count=41153104, count@entry=2147483647, isTopLevel=isTopLevel@entry=0 '\000', dest=0x0, dest@entry=0x4dd5868, altdest=altdest@entry=0x4dd5868, completionTag=0x273f210 "", completionTag@entry=0x40000000057 <error:
#17 0x000000000068ace4 in exec_simple_query (query_string=0x0) at postgres.c:1104
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0xf81a8, dbname=0x10000f000e000d <error: Cannot access memory at address 0x10000f000e000d>, username=<optimized out>) at postgres.c:4025
#19 0x000000000062ac43 in BackendRun (port=0x273f400) at postmaster.c:4162
#20 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x167fa0) at postmaster.c:4649
#21 0x00000000007c6b70 in main (argc=3, argv=0x167fa0) at main.c:198
(gdb)

which doesn't look all that different except has DirectFunctionCall4Coll now instead of DirectFunctionCall3Coll

Last edited 9 years ago by robe (previous) (diff)

comment:8 by robe, 9 years ago

FWIW - just ran 9.5 on debbie, and she crashes too so not just a windows / mingw issue

http://debbie.postgis.net:8080/view/PostGIS/job/PostGIS_Regress_PGDEV_Weekly/168/consoleFull

Perhaps its time I just add 9.5 to our regular job run.

comment:9 by pramsey, 9 years ago

How about r13588?

comment:10 by robe, 9 years ago

okay debbie is not crashing any more. But I still need to change regress test and verify underlying issue was fixed.

comment:11 by robe, 9 years ago

Resolution: fixed
Status: reopenedclosed

this one is fixed, but I think I have to reopen the other ticket cause I see an issue here.

comment:12 by robe, 9 years ago

Resolution: fixed
Status: closedreopened

okeedok nothing like breaking one thing to fix another. My commit at r13590 fixed #3127 but broke this one again:

though now I think I can only exercise with a large test like:

set enable_seqscan = true; --with index use fails (when use_spheroid = true )
SELECT gid --, RANK() OVER(ORDER BY ST_Distance( 'POINT(95 10)'::geography, geog) )
FROM knn_recheck_geog
ORDER BY 'POINT(95 10)'::geography <-> geog LIMIT 5;

I can't get the small ones to use an index anymore — and if I unremark out the rank in above, also refuses to use an index.

I'm okay with sacrificing and just putting in a note in docs tht ↔ uses sphere instead of spheroid. Unless you think there is an easy fix or its just uncovering a more serious issue.

Last edited 9 years ago by robe (previous) (diff)

comment:13 by robe, 9 years ago

Not sure this is helpful or not, but I traced where its failing in postgresql code (it's in the backend/executor/nodeIndexscan.c line 251) and revised to output what it's getting. The issue is that cmp variable is returning -1 (the dump of cmp is what I added to see why it's falling into that loop)

		if (scandesc->xs_recheckorderby)
		{
			econtext->ecxt_scantuple = slot;
			ResetExprContext(econtext);
			EvalOrderByExpressions(node, econtext);

			/*
			 * Was the ORDER BY value returned by the index accurate?  The
			 * recheck flag means that the index can return inaccurate values,
			 * but then again, the value returned for any particular tuple
			 * could also be exactly correct.  Compare the value returned by
			 * the index with the recalculated value.  (If the value returned
			 * by the index happened to be exact right, we can often avoid
			 * pushing the tuple to the queue, just to pop it back out again.)
			 */
			cmp = cmp_orderbyvals(node->iss_OrderByValues,
								  node->iss_OrderByNulls,
								  scandesc->xs_orderbyvals,
								  scandesc->xs_orderbynulls,
								  node);
			if (cmp < 0)
				elog(ERROR, "index returned tuples in wrong order: the cmp is %d", cmp);
			else if (cmp == 0)
				was_exact = true;
			else
				was_exact = false;
			lastfetched_vals = node->iss_OrderByValues;
			lastfetched_nulls = node->iss_OrderByNulls;
		}
		else
		{
			was_exact = true;
			lastfetched_vals = scandesc->xs_orderbyvals;
			lastfetched_nulls = scandesc->xs_orderbynulls;
		}
ERROR:  index returned tuples in wrong order: the cmp is -1

comment:14 by pramsey, 9 years ago

Change the docs. ↔ will only return sphere distance, it is required to be that way to harmonize the way the key test in hte index calculates distance with how the "exact" operator tests. ↔ should only be used for ordering, not measuring, if you want a distance, use ST_Distance, that's what the docs should say

comment:15 by robe, 9 years ago

Okay will do and I'll revert my change on the other and mark that one as won't fix after I fix up the tests to assume distance sphere

comment:16 by robe, 9 years ago

should be fixed at r13593 by reverting back to using sphere distance and put a note in docs that its based on sphere distance. I still need to put in a note in docs saying use ST_Distance for accurate measure, but that will come with an example. Also have changed the tests to avoid using rank in conjunction with index tests at r13594, I'm not sure why that forces it not to use an index. Have to double check that one. That doesn't make sense to me.

comment:17 by robe, 9 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.